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ABSTRACT 

Aims. We introduce an optimized data vector of cosmic shear measures (N). This data vector has high information content, is not sensitive 
against B-mode contamination and only shows small correlation between data points of different angular scales. 

Methods. We show that a data vector of the two-point correlation function (2PCF), hereafter denoted as in general contains more information 
on cosmological parameters compared to a data vector of the aperture mass dispersion, hereafter referred to as <M^ p ). Reason for this is the 
fact that (Af^ ) lacks the information of the convergence power spectrum on large angular scales, which is contained in f . Therefore we 
create a combined data vector N, which retains the advantages of (M^) and in addition is also sensitive to the large-scale information of V K . 
We compare the information content of the three data vectors by performing a detailed likelihood analysis and use ray-tracing simulations to 
derive the covariance matrices. In the last part of the paper we contaminate all data vectors with B-modes on small angular scales and examine 
their robustness against this contamination. 

Results. The combined data vector N strongly improves constraints on cosmological parameters compared to (M^ p >. Although, in case of 
a pure E-mode signal the information content of £ is higher, in the more realistic case where B-modes are present the 2PCF data vector is 
strongly contaminated and yields biased cosmological parameter estimates. N shows to be robust against this contamination. Furthermore the 
individual data points of N show a much smaller correlation compared to f , leading to an almost diagonal covariance matrix. 
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1. Introduction 

Weak gravitational lensing by the Large-Scale Structure (LSS), 
called cosmic shear, has become a valuable for cosmology. 
Since the first detection of cosmic shear in 2000 (Bacon et al. 
2000; Kaiser et al. 2000; van Waerbeke et al. 2000; Wittman 
et al. 2000), several surveys have been carried out with var- 
ious depth and width. The latest results show the ability of 
cosmic shear to constrain cosmological parameters, in partic- 
ular cr 8 (e.g. van Waerbeke et al. 2005; Semboloni et al. 2006; 
Hoekstra et al. 2006; Schrabback et al. 2007; Hetterscheidt 
et al. 2007; Massey et al. 2007). These constraints will improve 
even more in the near future, when the VST Kilo-degree sur- 
vey will cover an area of 1700 deg 2 with a depth of 15 galaxies 
per arc minute 2 , enabling us to estimate the shear signal with 
less than 1 % statistical error. This improvement of measuring 
cosmic shear should go along with an optimization of the data 
analysis. It is desirable to extract as much information as possi- 
ble from the observational data and to derive constraints free of 
any contamination. Currently, most cosmic shear surveys only 
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consider second-order shear statistics, for which all informa- 
tion is contained in the power spectrum of the convergence 
CP K ). Although Tk is not directly measureable, it is linearly 
related to second-order cosmic shear measures (e.g. the two- 
point correlation function and the aperture mass dispersion), 
which can be estimated from the distorted ellipticities of the 
observed galaxies. More precisely, all second-order measures 
are filtered versions of V K and the corresponding filter func- 
tions determine how the information content of r P K is sampled. 
It is the intention of this paper to compare several data vectors 
of cosmic shear measures and to create an optimal data vector 
with high information content, largely uncorrelated data points 
and only little sensitivity to a possible B-mode contamination. 
We first compare the information content of the two-point cor- 
relation function (2PCF) and aperture mass dispersion ((M 2 p )). 
We prove a general statement that a data vector consisting of 
2PCF data points (£) always gives tighter constraints on cosmo- 
logical models compared to a data vector consisting of (M 2 p ) 
data points ((M* p )) and we confirm this by a likelihood analy- 
sis of ray-tracing simulations. This result cannot surprise since 
the 2PCF integrates over all scales of V K and especially collects 
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information on large angular scales which is not taken into ac- 
count by the aperture mass dispersion. Nevertheless (M^ p ) has 
important advantages. First, it can be used to separate E-modes 
and B-modes (Crittenden et al. 2002; Schneider et al. 2002b), 
more precisely (M^_) is sensitive to E-modes only. Second, due 
to its narrow filter function it provides highly localized infor- 
mation on P K , implying that two different {M 2 p ) data points are 
much less correlated compared to the 2PCF. Third, {M 2 ip ) can 
be easier extended to higher-order statistics (Schneider et al. 
2005). These advantages are valuable and should be main- 
tained, but the information content should be improved. Hence, 
we extend the (M^ p ) data vector by one data point of £+(#o), 
which provides the large-scale information of P K and call this 
new data vector N. We perform a likelihood analysis for N, 
examine its ability to constrain cosmological parameters and 
compare it to the two aforementioned data vectors. 
This paper is organized as follows: Sect. 2 summarizes the ba- 
sic theoretical background of 2PCF and (Mj p ). Next we com- 
pare the information content of these two second-order mea- 
sures and introduce the improvement to the {M 2 ^} data vec- 
tor (Sect. 3). We perform a detailed likelihood analysis for the 
three data vectors and present the results in Sect. 4 and Sect. 
5. In Sect. 6 we contaminate our shear data vectors with B- 
modes and again perform the likelihood analysis to investigate 
how significantly each data vector is influenced. Finally in Sect. 
7 we discuss the results and give our conclusions. One final 
remark should be made on the notation. £ and (Mj p ) denote 
theoretical quantities calculated from a given power spectrum, 
whereas f and Ai are estimators obtained by averaging over 
many data points inside a bin. Vectors and matrices are written 
in bold font. 



2. Two-point statistics of cosmic shear 

In this section we briefly review the basics of two-point statis- 
tics, definitions of shear estimators and corresponding covari- 
ances, closely following the paper of Schneider et al. (2002a). 
For more details on these topics the reader is referred to 
Bartelmann & Schneider (2001) or, more recently, Schneider 
(2006). 

2.1. Two-point correlation function and aperture mass 
dispersion 

To measure the shear signal we define as the connecting vec- 
tor of two points and specify tangential and cross-component 
of the shear y as 



-Re (ye- 2 ' 1 *) and 



yx 
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where ip is the polar angle of 0. The 2PCFs depend only on the 
absolute value of 6 and are defined as 



{±(0) = <y,y,X0) ± <rxy x >(0) 



(2) 



The observed shear field can be decomposed into a gradient 
component (called E-mode) and a curl component (B-mode) 
(Crittenden et al. 2002; Schneider et al. 2002b). B-modes are 
considered to be a contamination of the pure lensing signal, 



due to noise or unresolved systematics. The limited validity of 
the Born approximation (Jain et al. 2000) or redshift source 
clustering (Schneider et al. 2002b) can also create B-modes, 
although these effects are small. Intrinsic alignment of source 
galaxies is another possible explanation. Predictions about the 
impact of this effect differ, anyway it can be overcome when 
using photometric redshifts (King & Schneider 2003). For the 
case of a general shear field consisting of E- and B-modes, the 
convergence is also complex, k — ke + i Kb, and it can be related 
to the shear (Kaiser & Squires 1993) by 



m(ff) + k B (0) = \ j d 2 6'D*{6 - 0')y(8') , 
with 
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The power spectra of E-mode and B-mode can be defined 
(Schneider et al. 2002b) using the Fourier transform of k 
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with d^Xl) as the two-dimensional Dirac delta distribution. 
The cross power spectrum Pee is expected to vanish for a 
statistically parity-invariant shear field. Note that Pe can be 
related to the power spectrum of density fluctuations Pg via 
Limber's equation (Kaiser 1992, 1998) 
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with i as the Fourier mode on the sky, w denotes the comoving 
coordinate, Wh the comoving coordinate of the horizon, /k(w) 
the comoving angular diameter distance and p w the redshift dis- 
tribution of source galaxies. The 2PCFs depend on both power 
spectra, Pe and Pb 
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(10) 



with J n denoting the n-th order Bessel function. 
Another second-order cosmic shear measure, the aperture mass 
dispersion, was introduced by Schneider et al. (1998) and is 
also related to the power spectrum. In contrast to the 2PCF 
{Ml ) only depends on the E-mode and {M]_ ) only on the B- 
mode power spectrum, hence the aperture mass statistics pro- 
vides a powerful tool to separate E- from B-modes 
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with 



(16) we can express Cm in terms of Q 



(13) 



From (9), (10) and (1 1) we see that the second-order shear mea- 
sures are filtered versions of Ve and How the different filter 
functions influence the information content of the correspond- 
ing measures will be examined more closely in Sect. 3. In prac- 
tice the aperture mass dispersion is difficult to measure due to 
gaps and holes in the data field but can be expressed in terms 
of £+ and as 



p29 
JO 



dftft 



(14) 



The explicit calculation and the filter functions T+ are given in 
Schneider et al. (2002b). 

2.2. Estimators 

Consider a sample of galaxies with angular positions 0,. For 
each pair of galaxies we define the connecting vector 6 - 0, -,—Qt 
and determine tangential and cross-components of the elliptic - 
ities (e t and e x ) with respect to this connecting vector. From 
these ellipticities we estimate the 2PCF in logarithmic bins of 
ft with a logarithmic bin width Aft (Schneider et al. 2002a). 
If the bin width is sufficiently small an unbiased estimator for 
£t(#) is given by 



^ ±{§) = WW) Z (e <'^ ± £i ^ )Am - e $ ' 



(15) 



with N p (ft) = 2y A§(\0j - 6j\) as the number of galaxy pairs 
inside a bin and A,j(|0, - 0j\) is 1 if - 9j\ lies inside bin ft, 
otherwise. An unbiased estimator of (M^ p ) can be calculated 
from f±(#) using (14), 



M(8 k ) = J] 



2 0} 

;=1 k 



(16) 



where / must be chosen such that the upper limit of the 7 th bin 
equals twice the value of 6%, 

2.3. Covariances 

Important for characterizing the amount of information of a 
shear estimator is the corresponding covariance. For the 2PCF 
it is defined as 

Cf (**, #y) := - hm) few - hw)) ■ (1 7 ) 

Assuming a Gaussian shear field the covariance of the 2PCF 
can be calculated analytically (Schneider et al. 2002a; Joachimi 
et al. 2007). As one already sees from (17) the 2PCF has four 
different covariances, denoted as C++, C+ , C +, C . Only 
three of them are independent since C+_(i9y, ftj) = C-+(ftj, fti). 
The covariance Cm (0k, Si) of M is defined analogously. Using 



1 ^ ^ Aft, Aft s 

Z T "'[j k ) T n[j^Cmn(ftuftj) 



(18) 



Similar to (16) I (J) are chosen such that the upper limit of the 
7 th (7 th ) bin equals twice of k (0,). 

3. The new data vector N 

Consider two data vectors, namely 



f = 











with £+ = 

















(19) 



for the 2PCF and 



( (M%){8i) 



(M 2 ap )(0„) ) 



(20) 



for the aperture mass dispersion. The relation (16) can also be 
written in terms of data vectors and a « x 2m transfer matrix A 



(21) 



with A+ denoting the part of A referring to £+ and A_ denotes 
the corresponding part referring to g . Eq. (21) implies that the 
information content of (M* p ) is less or equal compared to tj. 
The amount of information can only be equal if and only if the 
rank of A equals the dimension of hence rank A = 2m. We 
explicitly prove these statements in the Appendix. For the case 
of £ and (M* p ) n < m holds, which can be seen from (16). 
Therefore the relation (21) is not invertible and the informa- 
tion content of (M* p ) is smaller compared to £ ± . The fact that 
£± contains more information on cosmological parameters can 
also be explained when looking at the filter functions Jo, J 4 and 
W ap relating the corresponding second-order shear measures to 
the underlying power spectrum. ^+ probes the power spectrum 
over a broad range of Fourier modes and collects information 
also on scales larger than the survey size. In contrast, the aper- 
ture mass dispersion provides a highly localized probe of 
and does not contain this large-scale information. Hence, due 
to the limited field size of a survey the information content of 
{M* ) is smaller compared to £ ± . These considerations lead to 
the idea to modify (M* p ) by adding one data point of ^+(0q). 
We define the new data vector N as 



N 



{Ml v ){0„) 
t + (0o) 



(22) 
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and the corresponding covariance matrix reads 





Cm,,, 












(23) 
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The upper left n x n matrix is exactly Cm and the entry for 
C(£ + , £ + ) is taken from the corresponding covariance matrix of 
the correlation function. The cross terms can be calculated us- 
ing (16) and read 



2 tt e l 



r + |5|c ++ o?,-,0o) 



+ r.Uc t («/,fib) 



4. Calculating data vectors and covariances 

The data vectors £ , \M^ p ), W are directly calculated from the 
power spectrum of density fluctuations P s using (8) to obtain 
P E and then applying either (9), (10) or (11) depending on 
the desired cosmic shear measure. To derive f s we assume an 
initial Harrison-Zeldovich power spectrum CPg(k) oc k" with 
n = 1). The transition to todays power spectrum employs the 
transfer function described in Bardeen et al. (1986), and for 
the calculation of the nonlinear evolution we use the fitting 
formula of Smith et al. (2003). In contrast, the covariances 
are obtained from ray-tracing simulations. The N-body simu- 
lation used for the ray-tracing experiment was carried out by 
the Virgo Consortium (Jenkins et al. 2001); for details of the 
ray-tracing algorithm see Menard et al. (2003). is calcu- 
lated by field-to-field variation of 36 ray-tracing realisations, 
where each field has a sidelength of 4.27 degrees. The intrinsic 
ellipticity noise is cr e = 0.3 and the number density of source 
galaxies is given by « = 25/arcmin 2 . From we calculate 
Cm and Cm according to (18) and (23). The cosmology of the 
ray-tracing simulations, i.e. our fiducial cosmological model is 
a flat ACDM model with Q. m = 0.3, <x 8 = 0.9, /; = 0.7 and 
T = 0.172. 



4.1. Difficulties with covariances 
4.1 .1 . Underestimation of C M 

Kilbinger et al. (2006) have shown that (M^ p )(0) is biased for 
small 9 when calculated from the 2PCF using (16). This is due 
to the lack of 2PCF data points on very small angular scales 
which causes a small-scale cutoff in the integral of (14). In our 
specific case the (M 2 p ) data vector is not affected by this bias 
because we calculate it directly from the power spectrum f^. 
However, since Cm and Cjv are calculated from the covariance 
of the 2PCF, they are certainly affected by this problem. In this 
subsection we determine the 0-range on which we can calculate 
Cm wim sufficient accuracy; the corresponding data vector of 
the aperture mass dispersion will be restricted to this range. 
Fig. 1 shows (Ml p ) calculated directly from the power spec- 
trum compared with (M^ p ) calculated from £ ± using (16). We 
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(24) Fig. 1. This plot shows (M^ p ) calculated directly from the power spec- 
trum (11) compared to (M^ p ) calculated from £ ± (16). Due to the fact 
that we cannot estimate the 2PCF down to arbitrary small angular 



scales (here j? lr 



0!2) the calculated (Mr ) values are underesti- 



mated. The same problem occurs when calculating Cm from C f . The 
6>-range with a deviation smaller than 5% last from 2'25 - 100?0. 



assume that the deviation shown here is a good approximation 
for the bias in C m and we require an accuracy of 5 % to accept 
a 0-value for the (M% p ) data vector. This criterion restricts the 
data vector to a 0-range of 2'.25 - lOO'O whereas the 2PCF data 
vector is measured from Q'.2 - 200^0. 

4.1 .2. Inversion of the covariance matrix 

A second difficulty in the context of covariance matrices is out- 
lined in Hartlap et al. (2007). The fact that an inversion of an 
estimated unbiased covariance matrix leads to a biased result 
can be overcome by applying a correction factor. According to 
Hartlap et al. (2007) the correction factor depends on the ratio 
of number of bins (E) to number of independent realisations 
(AO from which the covariance matrix is estimated. An unbi- 
ased estimate of the inverse covariance matrix is 



_ N-B-2 . 

unbiased M — 1 



1 



B + 1 



N - 1 



(25) 



Hartlap et al. (2007) have proven the validity of this correction 
factor for the case of Gaussian errors and statistically indepen- 
dent data vectors. These two assumptions are violated when es- 
timating the covariance matrix from ray-tracing simulations. In 
order to check whether the correction factor corrects the error 
in our ray-tracing covariance matrices, we perform the follow- 
ing experiment. We add different Gaussian noise to the elliptic - 
ities of the galaxies, which are taken from the 36 independent 
realisations of the ray-tracing simulations and thereby increase 
the number of independent realisations. We hold the binning of 
the matrices constant, calculate covariances for 36, 108, 216, 
360, 720, 1080, 1440, 1800 independent realisations and plot 
1/tr CT 1 depending on the ratio B/N (Fig. 2). Note that this 
method only creates multiple realisations of Gaussian noise on 
the galaxy ellipticities and does not increase the number of re- 
alisations which determine the cosmic variance part of the co- 
variance matrix. Therefore, this method only partly checks for 
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Fig. 2. Inverting an estimated covariance matrix yields a bias which 
depends on the ratio of the number of bins to the number of inde- 
pendent realisations of the ray-tracing simulations (B/N). This depen- 
dence is linear and we correct the bias for all three inverted covariance 
matrices Cf, Cjv, Cm- We plot 1/tr (CT 1 ) for the corrected and un- 
corrected values where the lines indicate a fit through the data. All 
covariances are binned logarithmically, consists of 70 bins cover- 
ing arange (K2-20CK0; C/v an d Cm cover the range 2'25 — lOO'O with 
21 bins for C^ and 20 bins for Cm- 



the non-Gaussianity of the errors in a ray-tracing covariance 
matrix, nevertheless the impact of statistically dependent data 
vectors is fully taken into account. We find the same linear be- 
havior of the bias as Hartlap et al. (2007), therefore we are con- 
fident that the correction factor is able to unbias our covariance 
matrices. Using the corrected inverse covariance matrix we as- 
sure that the log-likelihood is also unbiased, nevertheless, any 
non-linear transformation of the log-likelihood will again intro- 
duce a bias which influences the results and must be examined. 



5. Likelihood analysis 

We define the posterior likelihood (Ppi) for the case of a 2PCF 
data vector as 



Prior 



Or), 



(26) 



where tt denotes the parameter vector of the ACDM model as- 
sumed in our likelihood analysis. Pp r ; or usually contains knowl- 
edge on the parameter vector from other experiments. In our 
case we assume flat priors with cutoffs, which means Ppn 0t is 
constant for all parameters inside a fixed interval and Pp r i or = 
for parameters outside the interval. The evidence Pe, is just the 
normalization, obtained by integrating the probability over the 
whole parameter space. The likelihood Pl, is defined as 



pirn = 



i 



: exp 



(27) 



(2n)"' 2 VdetQ 
with the ^-function 

£ f ). (28) 

£ f denotes the data vector corresponding to our fiducial model, 
whereas £(7r) varies according to the considered parameter 
space. To compare the information content of {M 1 ^, N we 
calculate the posterior likelihood in several parameter spaces 
and illustrate the result by contour plots. Smaller contours cor- 
respond to a higher information content. 

5.1. Quadrupole moments 

In addition to contour plots we illustrate the information con- 
tent of a data vector by calculating the determinant of the 
quadrupole moment of the posterior likelihood (Kilbinger & 
Schneider 2004) 



(29) 



Qij = j d 2 ?r Pj>L(n u n 2 )(m - «f)(^ - n% 

with n\ and 712 as the varied parameters, ji f . as the parameter of 
the fiducial model. The calculation of Q,j assumes a posterior 
likelihood in a two-dimensional parameter space, when con- 
sidering more than two varied parameters we calculate the (3, y 
for the marginalized posterior likelihood (see Sect. 5.3). The 
determinant is given by 



^detO/ = yjQuQiz - Qf 2 . 



(30) 



Tighter constraints on the parameters correspond to a smaller 
value of q. Due to its non-linearity in the log-likelihood q is bi- 
ased (Sect. 4.1.2). The amount of bias varies depending on the 
number of independent realisations from which the covariance 
matrix is estimated and we examine this effect in a similar way 
as for the covariance matrices in Sect. 4.1.2. For six different 
numbers of independent realisations we perform a likelihood 
analysis in a two-parameter space (Q m vs. <x 8 ) and calculate 
q for all three cosmic shear measures. The result is plotted in 
Fig. 3. One clearly sees that the q dependence on the number of 
realisations is much weaker compared with the difference be- 
tween q of different cosmic shear measures. Therefore, the bias 
is small and we can confidently use q to compare the relative 
information content of the different data vectors. 
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Fig. 5. The likelihood contours 
when varying only two parameters, 
while the others are fixed to the 
fiducial values. The contours con- 
tain 68.3 %, 95.4 %, 99.73 % of the 
posterior likelihood. We consider 
3 parameter spaces, from top to 
bottom: cr 8 vs. Q m , T vs. fl m , zo 
vs. n m . The constraints of g are 
shown on the left, M is plotted in 
the middle and the results of (M* p ) 
are shown on the right. 
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Fig. 3. The q of N, {M 1 ^) depending on the numbers of indepen- 
dent realisations from which the covariance matrix is estimated. As a 
parameter space we chose Q ra vs. erg. The deviation of q belonging to 
different numbers of realisations is much smaller than the difference 
of q of different data vectors. 

5.2. Variations of two parameters 

The likelihood analysis in this section is performed in a two- 
dimensional parameter space; all other cosmological parame- 
ters are fixed to the fiducial values. Before comparing the three 



data vectors we optimize N with respect to the 6*o-value of the 
added 2PCF data point. We add 35 different £+(#o) covering 
a range 6q e [0 ?2-200 '.0] and calculate q. Fig. 4 illustrates 
the results of this optimization for 3 different pairs of parame- 
ters (r vs. Q. m , cr 8 vs. Q m , zo vs. £2 m ). For all parameter com- 
binations considered the optimal Qq is close to 10'. This can 
be explained from the behavior of the covariance matrix. For 
small angular scales the covariance is dominated by shot noise, 
whereas for large angular scales the signal of becomes very 
small. In both cases the signal-to-noise ratio is lower than at 
medium angular scales, where we find the minimum of q. In 
our further analysis we always choose the optimal 2PCF data 
point for the combined data vector. The results are illustrated 
by contour plots (Fig. 5) and the corresponding values of q 
are summarized in Table 1. Here, we also list the results for 
two additional parameter combinations, erg vs. T and zo vs. 
<x 8 , not shown in Fig. 5. One clearly sees that the 2PCF data 
vector gives the tightest constraints on cosmological parame- 
ters whereas constraints from the aperture mass dispersion are 
weaker. Although not matching quite the amount of informa- 
tion of the combined data vector is a substantial improve- 
ment compared to (M* p ). This result is consistent for all pa- 
rameter combinations we examine; nevertheless the amount of 
the improvement varies. We calculate the difference in infor- 
mation of £ and N relative to (M 2 } and denote these values 
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Fig. 4. Here we plot q of the combined data vector when varying (9q) 
of the additional £ + data point. We calculate q for 35 different added 
<f + (t9 ) and show the behaviour in three different parameter spaces, q 
of the combined data vector can be optimized with respect to 9q and 
the optimal values are 7' 8 (T vs. fl m ), 12!9 (erg vs. fi m ) and I'.Q (zo vs. 
Cl m ). These values are the minima of a polynomial fit through the data 
points. 



A£ and AW (Table 1). The parameter combination erg vs. Q m 
shows a relative improvement of AW = 26.4%, whereas the 
improvement is much less for the case zo vs. <x 8 (AN = 4.1%). 
The amount of new information of £+(#o) depends on two main 
issues. First, £ + integrates over a very broad range of the power 
spectrum and it can happen that although Pe is sensitive to the 



Table 1. This table shows the q of N and (M^) for various pa- 
rameter spaces. Parameters over which we marginalize are mentioned 
in brackets. The entries are given in units of 1CT 4 and only q of the 
same parameter space can be compared. AN (Af ) gives the relative 
improvement compared to the q of (M^) and this improvement dif- 
fers with respect to the parameter space. 



parameter space 


<*c> 


N 


€ 


AW 


A£ 


r vs. o. m 


14.7 


11.7 


9.1 


20.4 % 


38.1 % 


erg vs. T 


23.1 


19.0 


14.6 


17.8 % 


36.8 % 


erg vs. n m 


427.1 


314.5 


220.1 


26.4 % 


48.5 % 


zo vs. n m 


46.4 


41.0 


32.9 


11.6 % 


29.1 % 


Zo vs. erg 


95.3 


91.4 


73.2 


4.1 % 


23.2 % 


erg vs. n ra (z ) 


416.9 


313.4 


230.0 


25.8 % 


44.8 % 


erg vs. Q m (T) 


780.5 


720.9 


527.0 


7.6% 


32.5 % 


T vs. Q m (erg) 


93.7 


77.6 


61.6 


17.2 % 


34.3 % 


erg vs. n m (T, zo) 


983.8 


850.6 


623.5 


13.5 % 


36.6 % 



parameters considered, the integral over Tk is much less. For 
example, if one varies F, the power spectrum is tilted and looks 
significantly different, whereas the corresponding £ + ((9o) might 
be very similar. Second, (M* p ) does not contain information 
on small Fourier modes, whereas N gains information about 
these modes from the data point £+(Oo)- However, in case these 
modes of the power spectrum are not sensitive to parameters 
considered, the information which is contributed by £+(t9o) is 
mainly redundant, hence AW is low. For example, varying erg 
or £l m changes P% similarly, i.e. increasing D. m or erg increases 
the amplitude of Pe on all Fourier modes. Therefore, the in- 
tegration over Pe is equally sensitive to parameter variations 
as Pe itself. Furthermore the deviation of power spectra with 
different values in cr% and Q. m becomes much more significant 
for small Fourier modes. Information on these scales is not in- 
cluded in (A/f p ) but contributed by £+(t9o), resulting in a large 
AW(26,4 %). In contrast to this, a variation of zo changes the 
power spectum very little, especially on low ^-scales the de- 
pendence is weak. Accordingly, the gain in information for the 
cases zo vs. Q. m and zo vs. crs is rather small. 



5.3. Variation of three and four parameters - 
marginalization 

In this section we perform a likelihood analysis in three- and 
four-dimensional parameter space. To illustrate the results in 
two-dimensional contour plots we define the marginalized pos- 
terior likelihood 



fmPL(7Tl2l£±) 



c/^4^Pl(?T1234|^±): 



(3D 



which is obtained by integrating over the posterior likelihood 
of the marginalized parameters. The marginalized likelihood 
is also biased due to its non-linearity in the log-likelihood. To 
examine whether this bias affects our results significantly we 
perform the same experiment as done for q in two-dimensional 
parameter space. We calculate q for our three different mea- 
sures depending on the number of realisations. The results are 
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Fig. 7. The likelihood contours of 
N and (M* f ) in three- and four- 
dimensional parameter space. From 
top to bottom we see T vs. Q. m 
marginalized over erg, erg vs. Q. m 
marginalized over zo and rr 8 vs. fl m 
marginalized over T and zo- The 
contours contain 68.3 %, 95.4 %, 
99.73 % of the marginalized poste- 
rior likelihood. The small scatter of 
the contours in the last plot is due 
to a lower resolution of the grid in 
four-dimensional parameter space 
compared to the grids in two- and 
three-dimensional parameter space. 
The contours, although broader, are 
comparable to those given in Fig. 5. 
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Fig. 6. This figure shows the q of N and (,M* p ) for the marginalized 
posterior likelihood depending on different numbers of realisations. 
The parameter space is cr 8 vs. Q m (marginalized over T and zo). The 
deviation of q belonging to different numbers of realisations is much 
smaller compared to the deviation of q of different measures. The lines 
indicate the fit through the data points. 



shown in Fig. 6; again, the bias due to the process of marginal- 
ization is small compared to the difference of q of our three 
data vectors showing that also in the marginalized case we can 



Table 2. The optimal angular separation 9 for the added £ + in the 
combined data vector N . The values are comparable to the similar 
analysis in two-parameter space (see Fig. 4). 



parameter space 


optimal value 9q 


T vs. Q. m (marginalized over erg) 


Oo = 9'.1 


erg vs. O m (marginalized over zo) 


9 = 13' .0 


cr 8 vs. Q m (marginalized over T and zo) 


<9 = 12' .0 



use q to compare the information content. We also optimize 
the combined data vector, similar to Sect. 5.2 and summarize 
the results in Table 2. For the same reasons as in the previous 
section the optimal angular scale of the added data point is 
again around 10' and we choose this optimized N for the likeli- 
hood analysis in three- and four-dimensional parameter space. 
The results of the likelihood analysis are comparable to those 
obtained in two-dimensional parameter space. The q (see Table 
1) are larger and the contours (see Fig. 7) are broader. Again, 
the relative improvement AN depends on the parameter space 
considered. For <x 8 vs. O m marginalized over zo the improve- 
ment is very high (25.8 %) but becomes much lower for erg 
vs. Q m marginalized over F. This can be explained by look- 
ing how Pe changes with respect to the variation in parameter 
space. For the combination <r 8 vs. Q m , we already explained 
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this in Sect. 5.2 and the influence of zo on 'Pe is quite simi- 
lar. Increasing zo also increases Pe, although the effect is not 
very large. Therefore, the improvement of <x g vs. Q. m marginal- 
ized over zo is comparable to the non-marginalized case. When 
varying the shape parameter T, P E is tilted and this dependence 
of Pe on T is different compared to the other three parameters. 
Scales of Pe which are most sensitive to F differ from scales 
sensitive to erg, Q m and zo and the same argument holds for 
the scales of the added £+(£>o)- Therefore, the optimal 60 for the 
case erg vs. Q. m marginalized over F is a compromise and the 
relative improvement is much lower (7.6 %) compared to os 
vs. Q m marginalized overzo (25.8 %). 

6. Simulation of a B-mode contamination on small 
angular scales 

In this section we simulate a B-mode contamination of N 
and (M* p ) on small angular scales. At present there is no model 
available which describes B-modes; taking into account that B- 
modes most likely occur on small angular scales (e.g. Hoekstra 
et al. 2002; van Waerbeke et al. 2005; Massey et al. 2007) we 
use the following arbitrary model for a B-mode power spec- 
trum 

P B W = 0.2P E (Oe^ /f , (32) 

where defines a scale beyond which the B-mode contami- 
nation decreases quickly. The B-mode contribution to f can be 
calculated from (9) and (10) by assuming Pe = 0. In order to 
calculate the covariance Cb we assume that the probability dis- 
tribution of B-modes can be described by a Gaussian random 
field. This assumption enables us to calculate the covariance di- 
rectly in terms of the power spectrum Pe, (Joachimi et al. 2007). 
The covariance of the 2PCF corresponding to the B-mode con- 
tribution is given by 

C++,. = j dm Q (m i )J {m j )lpl(o + 'PB(0^j , 

where A defines the volume of the survey, cr f the intrinsic el- 
lipticity noise and n the number density of the source galaxies. 
According to the corresponding values of the ray-tracing sim- 
ulations we choose cr e = 0.3 and n — 25/arcmin 2 . Note that 
Cg^ = Cg^ ( . The pure shot noise term of Cg* is contained 
in Cg*, in case of Cg~ this term vanishes anyway. We further 
assume that the contamination is independent of the lensing 
signal, hence there is no correlation between E- and B-modes. 
This assumption does not hold in case the B-mode signal is 
caused by insufficient PSF correction or other systematics, and 
we will comment on this at the end of this section. For the case 
that B-modes are created independently from E-modes we can 
define a combined E/B-mode covariance matrix as 

C to t = Ce + Cb ■ (33) 



Recall that Ce is estimated from ray-tracing simulations 
whereas Cb is calculated by assuming a Gaussian random field. 
The correction factor, needed to invert estimated matrices cor- 
rectly (see Sect. 4.1.1), must only be applied to Ce, not for 
Cb- We use the iterative approach of Miller (1981) to decom- 
pose this inverse of a sum of matrices into a summation of 
inverse matrices and then apply the correction factor only to 
Cg 1 . From now on, the procedure of the comparison is similar 
to Sect. 5.2 and Sect. 5.3. We calculate the Cm and Cjv from 
C^ and perform a likelihood analysis. We only show the results 
for the Q m vs. <x 8 plane (see Fig. 8). The black dots indicate 
the fiducial cosmological model, and in case of the 2PCF data 
vector there is a significant deviation to the parameters of the 
maximum of the posterior likelihood. (M* p ) and N are much 
more robust against the contamination. As expected, the max- 
imum of the posterior likelihood of the aperture mass disper- 
sion matches exactly the fiducial parameters and in case of N 
the discrepancy is negligibly small. Furthermore, the combi- 
nation still gives tighter constraints on the parameters. As al- 
ready mentioned above, the assumption of B-modes being in- 
dependent of the E-mode signal does not always hold. In case 
the contamination affects both, E-mode and B-mode signal, the 
impact on the parameter constraints of the different measures is 
hard to quantify. In case one measures a B-modes signal, it is a 
common approach to assume that the E-mode signal is contam- 
inated in a similar way, hence one correspondingly increases 
its error bars. Although this assumption is sensible, there are 
possible scenarios where the amount of contamination in E- 
and B-mode differs and the E-mode contamination cannot be 
quantified at all. Under the assumption that B-modes trace the 
scales of the E-mode contamination it is reasonable to exclude 
those scales from the likelihood analysis. This can be done us- 
ing (M* p ) or N but £ cannot avoid the contamination due to its 
broad filter functions. 

7. Conclusions 

Although the 2PCF and the aperture mass dispersion are both 
filtered versions of the power spectrum the first contains more 
information on Pe than the latter. Reason for this is that £ sam- 
ples the power spectrum over a much broader range and also 
collects information on scales which are larger than the size of 
the survey. (M* p ) lacks this large-scale information, but yields 
highly localized information on Pe- Nevertheless (M* p ) has 
other advantages. First, due to its narrow filter function the 
data points are much less correlated compared to the 2PCF 
data points. This leads to a mainly diagonal covariance ma- 
trix, which is numerically more stable during the inversion pro- 
cess in a likelihood analysis. Second, when considering higher- 
order statistics (M^ p ) is much easier to handle than the three- 
point correlation function (Schneider et al. 2005) and third, the 
aperture mass dispersion is sensitive to E-modes only. Based 
on these considerations we create the combined data vector 
N, which preserves the advantages of (M* p ) and addition- 
ally provides large-scale information on Pe- This data vec- 
tor can be optimized with respect to the angular scale of the 
added data point %+{0o), but this optimization very likely de- 
pends on the survey geometry and must be performed for each 
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Fig. 8. This plot shows the likelihood contours for the case that the shear signal is contaminated with B-modes. We only consider a two- 
dimensional parameter space (<x 8 vs. Q m ) and the contours again contain 68.3 %, 95.4 %, 99.73 % of the posterior likelihood. The black dot in 
each plot indicates the fiducial model. £ gives biased constraints, N and (M^ ) are hardly contaminated. 



survey separately. We compare the three data vectors in a de- 
tailed likelihood analysis and find that the combined data vec- 
tor is a strong improvement in information content compared 
to (M* p ). However, the amout of improvement depends on the 
parameter space considered, more precisely, on the dependence 
of Pe on variation of those parameters. The combined data vec- 
tor N also maintains the other advantages of the aperture mass 
dispersion. Its covariance matrix is almost diagonal, even the 
cross terms C(M(Ok),£; + (6o)) are much smaller compared with 
the off-diagonal terms of C^. Comparing the information con- 
tent of f and N, g gives tighter constraints if the shear signal 
only consists of E-modes. In the more realistic case, when also 
B-modes are present, £ is biased whereas N is hardly affected 
and still gives tighter constraints on cosmological parameters 
compared to <M^ p ). 

Appendix A: Comparison of two measures 

We compare the information content of two arbitrary data vec- 
tors referring to them as primary data vector p and secondary 
data vector s. We further assume that s can be calculated from 
p by a transfer matrix A (dimension n x m), with arbitrary n 
and m 





'Pi ) 




'Si ' 




P2 






p = 




and s = 






y Pm - 







with 



Ap 



(A.l) 



We define the covariance matrices of these data vectors as 

C P = ((P-P)(p-Pf) , (A.2) 

C s = ((s - S)0 - sf) , (A.3) 

where p (s) denotes the estimated and p (s) the true values of 
primary (secondary) measure. Using (A.l) we can relate both 
covariances through 

C s = A C p A 1 . (A.4) 

The transformation matrix A has to be of rank A = n, other- 
wise the covariance matrix of the secondary data vector C s = 



(A C p A') is singular and not invertible. Furthermore as A is of 
dimension (« x m), rank A < m implying n < m. We take the 
^-functions a measure for the information content 



4 



Ap C- 1 A p 



and 



xt 



A< CV V 



(A.5) 



where in our case A p = p i -p n (A s = s f - s n ) denotes the differ- 
ence between the fiducial data vector p l (s { ) and the data vector 
Pn ( s n) depending on the parameter vector n. In case^- 2 is min- 
imal, the posterior likelihood of the corresponding n being the 
correct parameter vector is maximized. The difference between 
Xp and^ 2 characterizes which probability function has a larger 
curvature, i.e. which data vector gives tighter constraints in pa- 
rameter space. Therefore the information content of primary 
and secondary data vector can be compared by calculating 



4 



A p C p ' A p - A p A' (A C p A}\ A A p 



(A.6) 



for arbitrary A p . In case this difference is always positive we 
can conclude that the primary data vector gives tighter con- 
straints on parameters. We can always find transformation ma- 
trices V (dimension m x m) and U (dimension n x n) to rewrite 
the transfer matrix A as an n x m matrix 



E„ 



= S = U A V" 



a = ir'sv. 



(A.7) 



We can directly calculate these transformation matrices as 
a multiplication of elementary matrices (Fischer 1997a). 
Inserting (A.7) into (A.6) we derive after some lengthy but 
straightforward calculation 



4- 

with 



A p l c'- 1 a; - a; 1 s' (s c s 1 ) -1 s a; 



C = V C„ V 1 



and 



A p = VA p . 



(A.8) 



(A.9) 



For simpler notation we discard all " ' " further on. We define 



C 1 = 



c 3 ) 







Di 


D 2 


m 


D 3 J 



(A.10) 
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with Ci being an n x n matrix and calculate 
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s' (s cs') 1 s 



1 







s = 


[cr 1 






{ o 






(A.ll) 



Using (A. 10) and (A.l 1) we can rewrite (A. 8) as 



4 
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Xs 



From C D 



and 







d, - cr 1 


D 2 




D 3 J 
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- E„ — > 


D, - 







\ - -C/ C 2 D 2 



-c, d; 



Inserting (A. 14) into (A. 13) we can rewrite (A. 12) as 



2 1 4 t 

=A P 



D 2 



D 



(A. 12) 



(A.13) 



(A. 14) 



(A. 15) 



3 / 



C is positive definite and symmetric, therefore D3 as a subma- 
trix is positive definite and symmetric and also the inverse D3 1 
has these favorable properties (Anderson 2003). Hence, we can 
decompose D 3 = LL' and finish our calculation as follows 



4 



Xs 



A' 



DzCL')" 1 
L 



(it 1 d 2 l) a p 



(A. 16) 



= ApT' TA p 
= l|TA p || 2 
> 0. 



(A.17) 



We will now examine the case when^j; = 0. The informa- 
tion content of primary and secondary measure is considered 
to be equal if and only if this equality holds for all data vectors 
A p . If there is only one A p for which x\ — xl > 0, the primary 
measure contains more information. The difference of the two 
X 1 -values is given by (A. 6). In case it is zero for all A p , 



Cp 1 = A' (A C p A 1 ) 1 A 



(A.18) 



must hold (Fischer 1997b). C p is of rank m, hence the lefthand- 
side of (A.18) must also have rank m. Then A must have rank 
m and is therefore a quadratic m x m matrix, which is of course 
invertible. This result is intuitively clear, if one is able to calcu- 
late A s from A p and vice versa the information content should 
be the same. We can summarize the results of the above calcu- 
lation in two statements: 

1. If a secondary measure can be calculated from a primary 
by a matrix A as described in (A. 1), the secondary measure 
has less or equal information. 

2. The amount of information is equal in case the rank of A 
equals the dimension of the primary data vector (m) imply- 
ing that A is invertible. 
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